Automatic Extraction of Complex Predicates in Bengali
نویسندگان
چکیده
This paper presents the automatic extraction of Complex Predicates (CPs) in Bengali with a special focus on compound verbs (Verb + Verb) and conjunct verbs (Noun /Adjective + Verb). The lexical patterns of compound and conjunct verbs are extracted based on the information of shallow morphology and available seed lists of verbs. Lexical scopes of compound and conjunct verbs in consecutive sequence of Complex Predicates (CPs) have been identified. The fine-grained error analysis through confusion matrix highlights some insufficiencies of lexical patterns and the impacts of different constraints that are used to identify the Complex Predicates (CPs). System achieves F-Scores of 75.73%, and 77.92% for compound verbs and 89.90% and 89.66% for conjunct verbs respectively on two types of Bengali corpus.
منابع مشابه
The Interlanguage of Persian Learners of Italian: a Focus on Complex Predicates
This paper aims at investigating the acquisition of Italian complex predicates by native speakers of Persian. Complex predication is not as pervasive a phenomenon in Italian as it is in Persian. Yet Italian native speakers use complex predicates productively; spontaneous data show that Persian learners of Italian seem to be perfectly aware of Italian complex predicates and use this familiar fea...
متن کاملBengali text summarization by sentence extraction
Text summarization is a process to produce an abstract or a summary by selecting significant portion of the information from one or more texts. In an automatic text summarization process, a text is given to the computer and the computer returns a shorter less redundant extract or abstract of the original text(s). Many techniques have been developed for summarizing English text(s). But, a very f...
متن کاملAutomatic classification of bengali sentences based on sense definitions present in bengali wordnet
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of ...
متن کاملAutomatic Identification of Bengali Noun-Noun Compounds Using Random Forest
This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the ...
متن کاملHindi Compound Verbs and their Automatic Extraction
We analyse Hindi complex predicates and propose linguistic tests for their detection. This analysis enables us to identify a category of V+V complex predicates called lexical compound verbs (LCpdVs) which need to be stored in the dictionary. Based on the linguistic analysis, a simple automatic method has been devised for extracting LCpdVs from corpora. We achieve an accuracy of around 98% in th...
متن کامل